Temporal Abstraction in TD Networks

نویسندگان

Richard S. Sutton

Eddie J. Rafols

Anna Koop

چکیده

Temporal-difference (TD) networks have been proposed as a way of representing and learning a wide variety of predictions about the interaction between an agent and its environment (Sutton & Tanner, 2005). These predictions are compositional in that their targets are defined in terms of other predictions, and subjunctive in that that they are about what would happen if an action or sequence of actions were taken. In conventional TD networks, the inter-related predictions are at successive time steps and contingent on a single action; here we generalize them to accommodate extended time intervals and contingency on whole ways of behaving. Our generalization is based on the options framework for temporal abstraction (Sutton, Precup & Singh, 1999). The primary contribution of this paper is to introduce a new algorithm for intra-option learning in TD networks with function approximation and eligibility traces. We present empirical examples of our algorithm’s effectiveness and of the greater representational expressiveness of temporally-abstract TD networks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Temporal Abstraction in TD Networks

متن کامل

TD Models: Modeling the World at a Mixture of Time Scales

Temporal-diierence (TD) learning can be used not just to predict rewards, as is commonly done in reinforcement learning, but also to predict states, i.e., to learn a model of the world's dynamics. We present theory and algorithms for intermixing TD models of the world at diierent levels of temporal abstraction within a single structure. Such multi-scale TD models can be used in model-based rein...

متن کامل

Using Decision Trees as the Answer Networks in Temporal Difference-Networks

State representation for intelligent agents is a continuous challenge as the need for abstraction is unavoidable in large state spaces. Predictive representations offer one way to obtain state abstraction by replacing a state with a set of predictions about future interactions with the world. One such formalism is the Temporal-Difference Networks framework [2]. It splits the representation of k...

متن کامل

TD(λ) Networks: Temporal-Difference Networks with Eligibility Traces

Temporal-difference (TD) networks have been introduced as a formalism for expressing and learning grounded world knowledge in a predictive form (Sutton & Tanner, 2005). Like conventional TD(0) methods, the learning algorithm for TD networks uses 1-step backups to train prediction units about future events. In conventional TD learning, the TD(λ) algorithm is often used to do more general multi-s...

متن کامل

POMDP 環境中での TD - Network の自動獲得 : 単純再帰構造による拡張 Automatic Acquisition of TD - Network in POMDP Environments : Extension with SRN structure 牧野貴樹

We propose a new neural network architecture, Simple recurrent TD Networks (SR-TDNs), that learns to predict future observations in partially observable environments, using proto-predictive representation of states. SR-TDNs incorporate the structure of simple recurrent neural networks (SRNs) into temporal-difference (TD) networks to use proto-predictive representation of states. Our simulation ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Temporal Abstraction in TD Networks

نویسندگان

چکیده

منابع مشابه

Temporal Abstraction in TD Networks

TD Models: Modeling the World at a Mixture of Time Scales

Using Decision Trees as the Answer Networks in Temporal Difference-Networks

TD(λ) Networks: Temporal-Difference Networks with Eligibility Traces

POMDP 環境中での TD - Network の自動獲得 : 単純再帰構造による拡張 Automatic Acquisition of TD - Network in POMDP Environments : Extension with SRN structure 牧野貴樹

عنوان ژورنال:

اشتراک گذاری